Beyond Translation Memories
نویسنده
چکیده
One key to the success of EBMT is the removal of the boundaries limiting the potential of translation memories. To bring EBMT to fruition, researchers and developers have to go beyond the self-imposed limitations of what is now traditional, in computing terms almost old fashioned, TM technology. Experiments have shown that the probability of finding exact matches at phrase level is higher than the probability of finding exact matches at the current TM segment level. We outline our implementation of a linguistically enhanced translation memory system (or Phrasal Lexicon) implementing phrasal matching. This system takes advantage of the huge and underused resources available in existing translation memories and develops a traditional TM into a sophisticated example-based machine translation engine which when integrated into a hybrid MT solution can yield significant improvements in translation quality. Translation Memories and EBMT Having kept translators up at night worried about the future of their profession, machine translation as we knew it can now safely be declared obsolete and dead. However, since this initial threat to well established work practices, translation has never been the same. New developments in computer assisted translation, more specifically the emergence of translation memory (TM) applications, have continued to keep translators on their guard. TMs – sophisticated search & replace engines? Initially, computational linguists often chose to simply ignore TM technology – considering it as some type of sophisticated search and replace engine, not a subject for serious research efforts. The developers of these systems on the other hand, many of them coming from an academic background and therefore being familiar with the latest computational linguistic research developments, recognised the value of existing research and decided that it was time to apply it in practice. For example: bilingual text alignment – this problem was declared as largely ‘solved’ in the early 90s by Gale and Church (1991) but never applied and proven in practice until the alignment utilities of translation memory developers some years later. Only recently, and driven by increased activities in the area of example-based machine translation (EBMT), has the interest shown by the linguistic tools industry in research results been reciprocated by the research community. One possible reason for this development is that although EBMT as a paradigm has been described in research papers as far back as 1984 (Nagao etc.) and although it managed to capture the interest and enthusiasm of many researchers it has, so far, failed to reach the level of 1 It is interesting to note here that what was deemed to be solved in theory turned out to present quite considerable problems when applied in practice (Schäler, 1994) maturity where it could be transformed from a research topic into a technology used to build a new generation of machine translation engines – and new approaches, technologies and applications are badly needed in MT. Unlocking the potential of TMs We believe that the time is ripe for the transformation of EBMT into demonstrators, technologies and eventually commercially viable machine translation engines along the lines suggested by Schäler (1996) and Macklovitch (2000) which are both based on the believe that existing translations contain more solutions to more translation problems than any other available resource (Isabelle et al., 1993). The key to the success of this development, we suggest, is the removal of the boundaries limiting the potential of translation memories. To bring EBMT to fruition, researchers and developers have to go beyond the selfimposed limitations of what is now traditional, in computing terms almost old fashioned, TM technology. EBMT and the Phrasal Lexicon EBMT has been proposed as an alternative and replacement for RBMT, initially by (Nagao, 1984), followed by extensions reported in (Sato & Nagao, 1990) and (Sadler & Vendelmans, 1990). EBMT has also been proposed as a solution to specific translation problems, as reported in (Sumita & Iida, 1991). The enormous variety of approaches to, the focus of, and the motivations for the use of examples in natural language processing (NLP) are testimony to the high level of interest in EBMT. Taking existing parallel texts as their starting point, researchers have worked on: Word-sense disambiguation (Brown et al., 1991) and translation ambiguity resolution (Doi, 1992) and (Uramoto, 1994); 2 The work quoted in the following bullet list can, unfortunately, not be fully referenced in this article, for practical reasons. Most of the reports mentioned were published in the proceedings of ANLP, COLING and ACL. Lexicography, e.g. the identification and translation of technical terminology (Dagan and Church, 1994), the development of an instant lexicographer (Karlgren, 1994); generally, the acquisition of lexical knowledge through the structural matching of bilingual sentences (Utsuro et al., 1994); Extraction of bilingual collocations or translation patterns from parallel corpora, non-aligned as in (Fung, 1995), (Rapp, 1995) and (Tanaka and Iwasaki, 1996), or aligned and using a linguistic (Matsumoto et al., 1993), statistical (Kupiec, 1993; Smadja et al., 1991; Smadja, 1993; Smadja et al.1996), or, indeed, a combined approach (Kumano and Hirakawa, 1994); special attention to the problems of extracting bilingual collocations for Asian languages is given by (Haruno, 1996) and (Shin, 1996); Translation Quality Measures (Su et al., 1992); Extensions to and variations of the basic idea of EBMT, proposing Pattern-based Machine Translation (Maruyama, 1993) and (Takeda, 1996), Transfer-Driven Machine Translation (Furuse and Iida, 1994), Statistical Machine Translation (Brown et al. 1993), Machine Translation based on Translation Templates (Kaji et al., 1992) and (Kinoshita, 1994), and Translation Patterns (Watanabe 1993) and (Watanabe, 1994). One idea, however, which precedes all of the approaches mentioned and which, surprisingly, has so far not been taken up by researchers to any significant degree, that of the Phrasal Lexicon, described by Joseph Becker (1975).
منابع مشابه
On the annotation of TMX translation memories for advanced leveraging in computer-aided translation
The term advanced leveraging refers to extensions beyond the current usage of translation memory (TM) in computer-aided translation (CAT). One of these extensions is the ability to identify and use matches on the sub-segment level — for instance, using sub-sentential elements when segments are sentences— to help the translator when a reasonable fuzzy-matched proposal is not available; some such...
متن کاملSearchable Translation Memories
In this paper we introduce a technique for creating searchable translation memories. Linear B’s searchable translation memories allow a translator to type in a phrase and retrieve a ranked list of possible translations for that phrase, which is ordered based on the likelihood of the translations. The searchable translation memories use translation models similar to those used in statistical mac...
متن کاملDistributed Translation Memories implementation using WebServices
Translation Memories are very useful for translators but are difficult to share and reuse in a community of translators. This article presents the concept of Distributed Translation Memories, where all users can contribute and sharing translations. Implementation details using WebServices are shown, as well as an example of a distributed system between Portugal and Spain.
متن کاملEnsembles of Classifiers for Cleaning Web Parallel Corpora and Translation Memories
The last years witnessed an increasing interest in the automatic methods for spotting false translation units in translation memories. This problem presents a great interest to industry as there are many translation memories that contain errors. A closely related line of research deals with identifying sentences that do not align in the parallel corpora mined from the web. The task of spotting ...
متن کاملDistributed Translation Memories implementation using WebServices0
Translation Memories are very useful for translators but are difficult to share and reuse in a community of translators. This article presents the concept of Distributed Translation Memories, where all users can contribute and sharing translations. Implementation details using WebServices are shown, as well as an example of a distributed system between Portugal and Spain.
متن کاملExpanding Translation Memories: Proposal and Evaluation of Several Methods
Translation memories used in Computer-aided translation (CAT) systems are the highest-quality resources of parallel texts since they are carefully prepared and checked by professional human translators. On the other hand, they are quite small when compared with other parallel data sources. In this paper, we propose several methods for expanding translation memories using both language-independe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001